🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents Github
Basic possibilities
- extracting text and layout Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers
- Local OCR Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)
- Determining the order of reading With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person
- Converting in Markdown Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder
Installation and requirements Python ≥ 3.10 (recommended 3.10.16).
Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).
For an EPUB conveier, you need access to the LLM service (for example, Deepseek).
🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents Github
Basic possibilities
- extracting text and layout Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers
- Local OCR Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)
- Determining the order of reading With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person
- Converting in Markdown Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder
Installation and requirements Python ≥ 3.10 (recommended 3.10.16).
Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).
For an EPUB conveier, you need access to the LLM service (for example, Deepseek).
Among the actives, Ascendas REIT sank 0.64 percent, while CapitaLand Integrated Commercial Trust plummeted 1.42 percent, City Developments plunged 1.12 percent, Dairy Farm International tumbled 0.86 percent, DBS Group skidded 0.68 percent, Genting Singapore retreated 0.67 percent, Hongkong Land climbed 1.30 percent, Mapletree Commercial Trust lost 0.47 percent, Mapletree Logistics Trust tanked 0.95 percent, Oversea-Chinese Banking Corporation dropped 0.61 percent, SATS rose 0.24 percent, SembCorp Industries shed 0.54 percent, Singapore Airlines surrendered 0.79 percent, Singapore Exchange slid 0.30 percent, Singapore Press Holdings declined 1.03 percent, Singapore Technologies Engineering dipped 0.26 percent, SingTel advanced 0.81 percent, United Overseas Bank fell 0.39 percent, Wilmar International eased 0.24 percent, Yangzijiang Shipbuilding jumped 1.42 percent and Keppel Corp, Thai Beverage, CapitaLand and Comfort DelGro were unchanged.
Tata Power whose core business is to generate, transmit and distribute electricity has made no money to investors in the last one decade. That is a big blunder considering it is one of the largest power generation companies in the country. One of the reasons is the company's huge debt levels which stood at ₹43,559 crore at the end of March 2021 compared to the company’s market capitalisation of ₹44,447 crore.